266 PART 5 Looking for Relationships with Correlation and Regression

Heads Up: Knowing What Can Go Wrong

with Logistic Regression

Logistic regression presents many of the same potential pitfalls as ordinary least-

squares regression (see Chapters 16 and 17), as well as several that are specific to

logistic regression. Watch out for some of the more common pitfalls:»

» Don’t fit a logistic function to non-logistic data: Don’t use logistic regres-

sion to fit data that doesn’t behave like the logistic S curve. Plot your grouped

data (as shown earlier in Figure 18-1b), and if it’s clear that the fraction of

positive outcomes isn’t leveling off at Y

0 or Y

1 for very large or very

small X values, then logistic regression is not the correct modeling approach.

The H-L test described earlier under the section “Assessing the adequacy of

the model” provides a statistical test to determine if your data qualify for

logistic regression. Also, in Chapter 19, we describe a more generalized logistic

model that contains other parameters for the upper and lower leveling-

off values.»

» Watch out for collinearity and disappearing significance: When you are

doing any kind of regression and two or more predictor variables are strongly

related with each other, you can be plagued with problems of collinearity. We

describe this problem in Chapter 17, and potential modeling solutions in

Chapter 20.»

» Check for inadvertent reverse-coding of the outcome variable: The

outcome variable should always be coded as 1 for a yes outcome and 0 for a

no outcome (refer to Table 18-1 for an example). If the variable in the data set

is coded using characters, you should recode an outcome variable using the

0/1 coding. It is important you do the coding yourself, and do not leave it to an

automated function in the program, because it may inadvertently reverse the

coding so that 1 = no and 0 = yes. This error of reversal won’t affect any p

values, but it will cause all your ORs and their CIs to be the reciprocals of what

they would have been, meaning they will refer to the odds of no rather than

the odds of yes

» Don’t misinterpret odds ratios for categorical predicators: Categorical

predictors should be coded numerically as we describe in Chapter 8. It is

important to ensure that proper indicator variable coding is used, and these

variables are introduced properly in the model, as described in Chapter 17.

Also, be careful not to misinterpret odds ratios for numerical predictors, and be

mindful of the complete separation problem, as described in the following

sections.